Transcriptomics part 2: Differential expression analysis
Eötvös Loránd University, Budapest & Biological Research Centre, Szeged
April 4, 2025
Extract expressed RNA, sequencing → fastq file
Pre-mapping quality checking, trimming (filtering)
Read mapping to reference genome OR de novo assembly of transcripts
Read counting
Quantitative or Differential Expression Analyses: comparing expression levels
Functional enrichment analysis: GO, pathways…
© Mark Robinson
\(Y_{i}\sim Binomial\left ( M,\lambda_{i} \right )\)
\(Y_{i} =\) the observed number of reads for gene \(i\)
\(M =\) the total number of reads (library size)
\(\lambda_{i} =\) proportion (relative abundance) of gene \(i\) reads (lambda)
Large \(M\), small \(\lambda_{i}\) → approximated well by Poisson \(\left ( \mu _{i} = M\cdot \lambda _{i} \right )\) (mu)
© Mark Robinson
© Mark Robinson
Image credit: Klaus B., EMBO J (2015) 34: 2727-2730
What we see in real data
© Mark Robinson
\(M =\) the total number of reads (library size)
\(\lambda_{i} =\) proportion (relative abundance) of gene \(i\) reads (lambda)
\[ Y_{i}\sim Poisson\left ( M\cdot \lambda_{i} \right ) \] \[ mean\left ( Y_{i} \right ) = variance \left ( Y_{i} \right ) = M\cdot \lambda_{i} \]
\(\varphi =\) dispersion parameter (phi)
\[ Y_{i}\sim NB\left ( \mu_{i} = M\cdot \lambda_{i}, \varphi_{i} \right ) \] Same mean, but variance is larger (quardatic):
\[ variance\left ( Y_{i} \right ) = \mu_{i} \left ( 1 + \mu_{i} \varphi_{i} \right ) \]
Critical paraemeter to estimate: \(\varphi_{i} =\) dispersion
© Mark Robinson
Addressing library size biases
Addressing library size and gene length biases
2 groups:
More treatment types, more groups:
MA plot: transforming the data onto \(M\) (log ratio) and \(A\) (mean average) scales
Image credit: Yin T, et al., Genome Biology (2012) 13: R77
Volcano plot